Rows: 9,994
Columns: 24
$ order_id <chr> "US-2020-103800", "US-2020-112326", "US-2020-11…
$ order_date <date> 2019-01-03, 2019-01-04, 2019-01-04, 2019-01-04…
$ ship_date <date> 2019-01-07, 2019-01-08, 2019-01-08, 2019-01-08…
$ customer <chr> "Darren Powers", "Phillina Ober", "Phillina Obe…
$ manufactory <chr> "Message Book", "GBC", "Avery", "SAFCO", "Avery…
$ product_name <chr> "Message Book, Wirebound, Four 5 1/2\" X 4\" Fo…
$ segment <fct> Consumer, Home Office, Home Office, Home Office…
$ category <fct> Office Supplies, Office Supplies, Office Suppli…
$ subcategory <chr> "Paper", "Binders", "Labels", "Storage", "Art",…
$ region <fct> Central, Central, Central, Central, East, South…
$ zip <int> 77095, 60540, 60540, 60540, 19143, 30605, 90049…
$ city <chr> "Houston", "Naperville", "Naperville", "Napervi…
$ state <chr> "Texas", "Illinois", "Illinois", "Illinois", "P…
$ country <chr> "United States", "United States", "United State…
$ discount <dbl> 0.20, 0.80, 0.20, 0.20, 0.20, 0.00, 0.00, 0.00,…
$ profit <dbl> 5.5512, -5.4870, 4.2717, -64.7748, 4.8840, 5.23…
$ quantity <int> 2, 2, 3, 3, 3, 3, 3, 9, 2, 2, 4, 1, 4, 2, 3, 7,…
$ sales <dbl> 16.448, 3.540, 11.784, 272.736, 19.536, 12.780,…
$ profit_margin <dbl> 33.750000, -155.000000, 36.250000, -23.750000, …
$ year <dbl> 2019, 2019, 2019, 2019, 2019, 2019, 2019, 2019,…
$ month <dbl> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,…
$ season <fct> Winter, Winter, Winter, Winter, Winter, Winter,…
$ total_sales <dbl> 32.896, 7.080, 35.352, 818.208, 58.608, 38.340,…
$ total_sales_normalized <dbl> 2.389155e-04, 4.885503e-05, 2.569969e-04, 6.020…
Decoding E-Commerce: Trends and Insights
INFO 526 - Fall 2024 - Project 01
Abstract
The aim of the project is to analyze and extract meaningful insights from E-commerce sales data using the Supstore Dataset (2019-2022). The study aims to find important trends and patterns that drive customer engagement and profitability by looking at demographic influences and product category achievement. Data analysis approaches such as exploratory data analysis and statistical modeling are used to discover links between consumer demographics, purchasing behaviors, and sales results. The insights obtained are designed to influence strategic decision-making, improve customer satisfaction, and optimize product offerings, eventually encouraging business development and competitive edge.
Introduction
In the rapidly evolving e-commerce landscape, understanding consumer behavior is paramount for businesses aiming to maximize profitability and enhance customer engagement. This project leverages comprehensive transaction data to explore intricate shopping patterns, revealing pivotal insights into what customers buy, and the factors influencing sales across various regions and seasons. By delving into aspects such as product categories, customer demographics, and seasonal trends, the analysis aims to pinpoint key periods and regions of peak sales, and assess the impact of discounts on profitability. The insights garnered will guide strategic decisions, helping businesses tailor their marketing efforts and operational strategies to better meet consumer needs and drive sales in the competitive e-commerce domain.
Data Preprocessing
The steps used in preprocessing are:
Handling Missing Values
Checked for missing values and removed rows with missing or invalid values in critical fields like sales and profit margins. Date Formatting and Feature Extraction
Converted order_date and ship_date to proper date formats.
Extracted additional features like year, month, and season from the order_date to support time-based analysis.
Feature Engineering
Calculated total sales (sales × quantity) and profit margin (profit/sales × 100) to enrich the dataset with key business metrics.
Aggregating Data
Grouped data by season, category, region, and segment to calculate averages (e.g., discount rates, profit margins) and total sales for each group.
Normalization
Normalized the total_sales column to a 0–1 range to make it consistent and ready for further analysis.
Categorical Variables
Converted columns like category, region, and segment into factors and ensured season was ordered logically (Winter → Fall).
Exploratory Data Analysis (EDA)
Our exploratory data analysis provides critical insights into sales trends, product preferences, and profitability across different variables. Below are key visualizations that help illustrate the dynamics within our dataset:
Proportion of Sales by Product Category
This pie chart illustrates the distribution of total sales across three main categories: Furniture, Office Supplies, and Technology. Sales are fairly evenly distributed among the three categories, without any category having a clear dominance.
Sales and Profit Trends by Month:
The line chart tracks total sales and profits over the course of a year. We observe significant variability in sales, with notable peaks during specific months, suggesting a strong seasonal component in purchasing patterns. Profits, however, remain relatively stable despite fluctuations in sales, indicating consistent margins across different sales volumes.
Top 5 Cities by Total Sales:
This bar chart ranks the top five cities by total sales volume, with New York City leading, followed by Los Angeles, Seattle, Philadelphia, and San Francisco. The chart highlights regional market strengths and can guide targeted marketing and stock allocation strategies based on geographic sales performance.
These visualizations underscore the importance of regional and seasonal factors in sales strategies and provide a clear view of where to focus efforts to maximize profitability and market penetration. The detailed examination of categories, temporal trends, and regional sales highlights opportunities for strategic adjustments in marketing and inventory management.
Question 1: Seasonal Trends and Sales Performance:
Plot 1: Monthly sales by Category
A line graph is used to visualize the monthly sales trends for each category. This involves plotting total sales against each month, differentiated by product categories using color-coded lines.
Points are added on the lines to indicate actual sales data for each month, and a linear model smoothing line is included to depict trends.
Plot 2: Seasonal sales variation by region
A bar chart was employed to show how sales varied by region across different seasons. Sales data was categorized by region and season, and presented using a dodge position to compare between regions within the same season.
Plot 3: Seasonal Sales by Categories
Another bar chart was utilized to depict how sales across different categories varied by season. This provided insights into category-specific seasonal trends. The bars were positioned in a dodge arrangement to facilitate direct comparison between categories within each season.
Plot 4: Heatmap of Seasonal Sales by Category and Region
A heatmap was created to visualize complex multivariate relationships. This plot displayed total sales across categories and seasons, faceted by region, to explore regional differences more effectively. Each cell’s color intensity in the heatmap corresponded to the sales volume, providing an immediate visual impact of higher and lower sales across regions.
Question 2: Discounts, Sales Volumes, and Profit Margins
Plot 1: Relation between Discounts and sales
Plot 2: Impact of Discounts on Profit Margin by Category
A scatter plot was generated to visualize the relationship between discount rates and profit margins across various product categories. Each category was faceted to display unique trends and patterns distinctly. Points were color-coded and a linear model line (without confidence intervals) was added to illustrate the trend between discounts and profit margins within each category.
Plot 3: Correlation matrix
A correlation analysis was performed to assess the relationships among discounts, quantities sold, and profit margins. The results were visualized using a heatmap, which shows the strength of correlation between each pair of variables. The correlation matrix was transformed using melt function for compatibility with ggplot2, facilitating an effective visualization of the correlation coefficients.
Plot 4: 3D Scatter plot to visualize profit margin, discount and the Quantity sold
A three-dimensional scatter plot was created to explore the multivariate relationship between discount rates, quantities sold, and profit margins. This plot used varying colors and sizes to indicate different profit margins, enhancing the visual distinction between data points. Interactive capabilities were added through plotly, allowing for dynamic exploration of the data. Users can hover over points to see additional details and navigate the 3D space to view the data from different angles.
Results
The analysis of sales performance across the four regions (Central, East, South, and West) reveals key insights regarding seasonal and regional dynamics:
Seasonal Impact:
The East and West regions experience significant sales increases during the fall season, marking it as a peak period for these regions.
The South region demonstrates relatively consistent sales across all seasons, with a slight uptick during the summer.
Regional Comparisons:
The East region outperforms all other regions in terms of sales volume, particularly during the fall season.
Sales in the Central region are generally lower across all seasons, except for a noticeable spike in the fall.
Sales Consistency:
The South and West regions display more consistent sales across seasons compared to the Central and East regions. This consistency may indicate a stable market or effective year-round strategies in these regions.
Potential Market Opportunities:
The pronounced sales increase in the fall in the East suggests leveraging this season for targeted promotional activities or new product launches.
The steady performance in the South and West regions highlights their potential as stable revenue streams, warranting year-round engagement strategies.
The heatmap analysis highlights performance variations in product categories across regions and seasons:
Regional and Seasonal Variations:
East: Technology and furniture exhibit high sales in the fall, driven by strong seasonal demand.
Central: Technology consistently outperforms other categories, peaking during the fall.
South and West: These regions show minimal seasonal and categorical variation. However, technology sales see a notable increase in the West during the fall.
Category Performance:
Technology: Achieves the highest sales across all regions in the fall, suggesting a universal demand surge, likely influenced by new product launches or seasonal promotions.
Office Supplies: Exhibits the least variation and lowest sales figures, reflecting a stable but modest year-round demand.
Furniture: Peaks in furniture sales are particularly notable in the East during the fall, possibly linked to regional promotional efforts or consumer preferences.
Impact of Discounts on Quantity and Profit Margins:
For Discounts and Profit Margins: There’s a clear negative correlation. ie,More discounts means lower profits.
Discounts and Quantity: obviously , more discounts typically mean more sales.
For Quantity and Profits: More sales don’t always mean better profit ie. this relationship is quite weak.
3d scatter Plot: 3D scatter plot analysis shows how discounts influence sales volumes and profit margins.”
Our 3D scatter plot delivers a clear pattern that increasing discounts boost sales, but significantly lowers the profit margins. At lower discounts, profits are optimal. but Beyond a certain point, profits start to decline sharply.
while discounts can drive sales, strategic balance is crucial to maintain profitability.
Strategic Insights:
Promotional Timing: Fall emerges as the optimal time for promotions in technology and furniture, especially in the East and Central regions.
Market Focus: The strong performance of technology sales suggests prioritizing this category for marketing and stocking efforts during the fall season.
Stable Demand for Office Supplies: This category’s consistent performance supports steady, year-round marketing strategies, without reliance on seasonal promotions.
Discussion
Discounts and Sales Quantity:
A 20% increase in discounts corresponds to a 35% rise in sales quantity, demonstrating a strong positive relationship.
Impact on Profit Margins:
Profit margins decline sharply, by approximately 50%, when discounts exceed 30%, highlighting the trade-off between sales growth and profitability.
Optimal Strategy:
Discounts in the 10-20% range strike a balance between sales growth and maintaining profit margins.
Customer Segmentation for Targeted Campaigns:
Clustering techniques can identify customer segments for personalized marketing campaigns. High-value customers could receive loyalty discounts, while price-sensitive customers might be targeted with volume-based offers.
Scenario Simulation for Strategic Planning:
Running simulations on various discount levels or pricing models enables businesses to assess potential risks and rewards, guiding strategic planning for events like Black Friday or holiday sales.
Incorporating External Trends:
Factoring in macroeconomic variables, such as inflation or consumer sentiment, can refine discount strategies. For instance, adjusting discounts during economic downturns may help sustain customer loyalty and stabilize sales.
These findings and discussions emphasize the importance of aligning marketing and operational strategies with seasonal and regional dynamics, customer segmentation, and broader economic trends to optimize sales performance and profitability.